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To make precise the sense in which nature fails to respect classical physics, one requires a for¬ 
mal notion of classicality. Ideally, such a notion should be defined operationally, so that it can be 
subjected to a direct experimental test, and it should be applicable in a wide variety of experi¬ 
mental scenarios, so that it can cover the breadth of phenomena that are thought to defy classical 
understanding. Bell’s notion of local causality fulfills the first criterion but not the second. The 
notion of noncontextuality fulfills the second criterion, but it is a long-standing question whether it 
can be made to fulfill the first. Previous attempts to experimentally test noncontextuality have all 
presumed certain idealizations that do not hold in real experiments, namely, noiseless measurements 
and exact operational equivalences. We here show how to devise tests that are free of these ideal¬ 
izations. We also perform a photonic implementation of one such test that rules out noncontextual 
models with high confidence. 


I. INTRODUCTION 

Making precise the manner in which a quantum world 
differs from a classical one is a surprisingly difficult task. 
The most successful attempt, due to Bell [T], shows a 
conflict between quantum theory and a feature of classi¬ 
cal theories termed local causality, which asserts that no 
causal influences propagate faster than light. But the lat¬ 
ter assumption can only be tested for scenarios wherein 
there are two or more systems that are space-like sepa¬ 
rated. And yet few believe that this highly specialized 
situation is the only point where the quantum departs 
from the classical. A leading candidate for a notion of 
nonclassicality with a broader scope is the failure of quan¬ 
tum theory to admit of a noncontextual model, as proven 
by Kochen and Specker [2] . Recent work has highlighted 
how this notion lies at the heart of many phenomena that 
are taken to be distinctly quantum: the fact that quasi¬ 
probability representations go negative 13 0 , the exis¬ 
tence of quantum advantages for cryptography [5] and 
for computation EHH], and the possibility of anomalous 
weak values [5] . 

An experimental refutation of noncontextuality would 
demonstrate that the conflict with noncontextual models 
is not only a feature of quantum theory, but of nature 
itself, and hence also of any successor to quantum theory. 
The requirements for such an experimental test, however, 
have been a subject of much controversy [TOHIH] . 

A fundamental problem with most proposals for test¬ 
ing noncontextuality [nHi3, and experiments performed 
to date [21h32) . is that they assume that measure¬ 
ments have a deterministic response in the noncontex¬ 
tual model. It has been shown that this can only be 
justified under the idealization that measurements are 
noiseless [SS], which is never satisfied precisely by any 
real experiment. We here show how to contend with such 
noise. 

Another critical problem with previous proposals is 


the fact that the assumption of noncontextuality can 
only be brought to bear when two measurement events 
(an event is a measurement and an outcome) are opera¬ 
tionally equivalent, which occurs when the two events are 
assigned exactly the same probability by all preparation 
procedures [33] ; in this case they are said to differ only by 
the measurement context. In a real experiment, however, 
one never achieves the ideal of precise operational equiv¬ 
alence. Previous work on testing noncontextuality— 
including the only experiment to have circumvented the 
problem of noisy measurements (by focusing on prepa¬ 
rations) [5|— has failed to provide a satisfactory account 
of how the deviation from strict operational equivalence 
should be accounted for in the interpretation of the re¬ 
sults. We here demonstrate a general technique that al¬ 
lows one to circumvent this problem. 

For Bell’s notion of local causality, the theoretical work 
of Clauser et. al. |35] was critical to enabling an ex¬ 
perimental test without unwarranted idealizations, such 
as the perfect correlations presumed in Bell’s original 
proof [1]. Similarly, the theoretical innovations we in¬ 
troduce here make it possible for the first time to sub¬ 
ject noncontextuality to an experimental test without the 
idealizations described above. We report on a quantum- 
optical experiment of this kind, the results of which rule 
out noncontextual models with high confidence. 


II. A NONCONTEXUALITY INEQUALITY 

According to the operational approach proposed in 
ref. 1341 to assume noncontextuality is to assume a con¬ 
straint on model-construction, namely, that if procedures 
are statistically equivalent at the operational level then 
they ought to be statistically equivalent in the underlying 
model. 

Operationally, a system is associated with a set Ni 
(resp. V) of physically possible measurement (resp. 
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preparation) procedures. An operational theory spec¬ 
ifies the possibilities for the conditional probabilities 
{p{X\P, M) : P G V,M G Xi} where X ranges over the 
outcomes of measurement M. In an ontological model 
of such a theory, the causal influence of the preparation 
on the measurement outcome is mediated by the ontic 
state of the system, that is, a full specification of the sys¬ 
tem’s physical properties. We denote the space of ontic 
states by A. It is presumed that when the preparation 
P is implemented, the ontic state of the system, X G A, 
is sampled from a probability distribution /i(A|P), and 
when the system is subjected to the measurement M, 
the outcome X is distributed as ^(A|M, A). Finally, for 
the model to reproduce the experimental statistics, we 
require that 

^e(A|M,A)M(A|P)=p(A|M,P). (1) 

AgA 

A general discussion of the assumption of noncontextu- 
ality is provided in Appendix]^ but one can understand 
the concept through the concrete example we consider 
here (based on a construction from Sec. V of ref. IM|) . 

Suppose there is a measurement procedure, M*, that 
is operationally indistinguishable from a fair coin flip: it 
always gives a uniformly random outcome regardless of 
the preparation procedure, 

p(A = 0,l|M*,P) = i, VPeP. (2) 

In this case, noncontextuality dictates that in the un¬ 
derlying model, the measurement should also give a uni¬ 
formly random outcome regardless of the ontic state of 
the system, 

e(A = 0,l|M„A) = i, VAg A. (3) 

In other words, because M* appears operationally to be 
just like a coin flip, noncontextuality dictates that phys¬ 
ically it must be just like a coin flip. 

The second application of noncontextuality is essen¬ 
tially a time-reversed version of the first. Suppose there 
is a triple of preparation procedures. Pi, P 2 and P 3 , that 
are operationally indistinguishable from one another: no 
measurement reveals any information about which of 
these preparations was implemented, 

\/M gM: p{X\M, Pi) = p{X\M, P 2 ) = p(X\M, P 3 ). 

( 4 ) 

In this case, noncontextuality dictates that in the un¬ 
derlying model, the ontic state of the system does not 
contain any information about which of these prepara¬ 
tion procedures was implemented, 

VA G A : Ai(A|Pi) = /r(A|P2) = m(A|P3). (5) 

In other words, because it is impossible, operationally, to 
extract such information, noncontextuality dictates that 
physically, the information is not present in the system. 


Suppose that M* can be realized as a uniform mix¬ 
ture of three other binary-outcome measurements, de¬ 
noted Ml, M 2 and M 3 . That is, one implements M* by 
uniformly sampling t G {1,2,3}, implementing Mt, then 
outputting its outcome as the outcome of M*. Finally, 
suppose that each preparation Pt can be realized as the 
equal mixture of two other preparation procedures, de¬ 
noted Ptp and Pt^i. 

Consider implementing Mt on Pt^b, and consider the 
average degree of correlation between the measurement 
outcome X and the preparation variable b: 

^=l T. E PiX = b\Mt,Pt,b). (6) 

tG{1.2,3} hG{0,l} 

We now show that noncontextuality implies a nontrivial 
bound on A. 

The proof is by contradiction. In order to have perfect 
correlation on average, we require perfect correlation in 
each term, which implies that for all ontic states A as¬ 
signed nonzero probability by Pt^b, the measurement Mt 
must respond deterministically with the X = b outcome. 
Given that Pt is an equal mixture of Ptp and Pt,i, it fol¬ 
lows that for all ontic states A assigned nonzero probabil¬ 
ity by Pt , the measurement Mt must have a deterministic 
response. 

But Eq. ([^ (which follows from the assumption of non¬ 
contextuality) asserts that the preparations Pi, P 2 and 
P 3 must assign nonzero probability to precisely the same 
set of ontic states. Therefore, to achieve perfect correla¬ 
tion on average, each measurement must respond deter¬ 
ministically to all the ontic states in this set. 

Now note that by the definition of M*, the prob¬ 
ability of its outcome A = 6 is £,{X = 6 |M*,A) = 
I X]tG{i 2 3 } = 6 |Mt,A). But then Eq. ([^ (which 

follows from the assumption of noncontextuality) says 

i ^ ^(A = 6 |M*,A) = ^. (7) 

iG{l,2,3} 

For each deterministic assignment of values, (■^(A = 
6|Mi,A),e(A = 6|M2,A),e(A = &|M3,A)) G 

1(0,0, 0), (0,0,1),..., (1,1,1)1, the constraint of Eq. Q 
is violated. It follows, therefore, that for a given A, one of 
Ml, M 2 or M 3 must fail to have a deterministic response, 
contradicting the requirement for perfect correlation on 
average. This concludes the proof. 

The precise (i.e. tight) bound is 


as we demonstrate in Appendix This is our noncon¬ 
textuality inequality. 

III. QUANTUM VIOLATION OF THE 
INEQUALITY 

Quantum theory predicts there is a set of preparations 
and measurements on a qubit having the supposed prop- 
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erties and achieving A = 1, the logical maximum. Take 
the Mt to be represented by the observables a ■ fit where 
a is the vector of Pauli operators and the unit vectors 
{hi,fi 2 ,ri 3 } are separated by 120 ° in the x — z plane of 
the Bloch sphere of qubit states [SB]. The Pt^b are the 
eigenstates of these observables, where we associate the 
positive eigenstate |+ht)(+h(| with 6 = 0. To see that 
the statistical equivalence of Eq. ([^ is satished, it suffices 
to note that 

-|+hi)(+hi| + -|+h 2 )(+h 2 | + -|+ri 3 )(+h 3 | = -I, (9) 

and to recall that for any density operator p, tr(pil) = i. 
To see that the statistical equivalence of Eq. Q is satis¬ 
fied, it suffices to note that for all pairs t, t' S { 1 , 2 ,3}, 

= \\+nt'){+nt'\ + (10) 

which asserts that the average density operator for each 
value of t is the same, and therefore leads to precisely 
the same statistics for all measurements. Finally, it is 
clear that the outcome of the measurement of a ■ ht is 
necessarily perfectly correlated with whether the state 
was \+nt){+nt\ or |—ht)(—ht|, so that A = 1. 

These quantum measurements and preparations are 
what we seek to implement experimentally, so we refer 
to them as ideal, and denote them by M) and Pj. 

Note that our noncontextuality inequality can accom¬ 
modate noise in both the measurements and the prepa¬ 
rations, up to the point where the average of p{X = 
b\Mt,Pt^b) drops below |. It is in this sense that our 
inequality does not presume the idealization of noiseless 
measurements. 


IV. CONTENDING WITH THE LACK OF 
EXACT OPERATIONAL EQUIVALENCE 


The actual preparations and measurements in the ex¬ 
periment, which we call the primary procedures and de¬ 
note by Pf o, Pf i, P2'',o> Ph and Mf, M^, 

Mg , necessarily deviate from the ideal versions and con¬ 
sequently their mixtures, that is, Pf, Pj’, P^ and M* , 
fail to achieve strict equality in Eqs. Q and Q. 

We solve this problem as follows. From the outcome 
probabilities on the six primary preparations, one can in¬ 
fer the outcome probabilities on the entire family of prob¬ 
abilistic mixtures of these. It is possible to find within 
this family many sets of six preparations, Pf q, Pf g, P| o, 
P| 1, P|q, Pi g, which define mixed preparations P|, P|, 
P| that satisfy the operational equivalences of Eq. Q 
exactly. We call the P|j, secondary preparations. We can 
define secondary measurements Mf, M|, M| and their 
uniform mixture in a similar fashion. The essence of 
our approach, then, is to identify such secondary sets of 
procedures and use these to calculate A. If quantum the¬ 
ory is correct, then we expect to get a value of A close to 




C Pf_, 



FIG. 1. Illustration of our solution to the problem of the fail¬ 
ure to achieve strict operational equivalences of preparations 
(under the simplifying assumption that these are confined to 
the x — z plane of the Bloch sphere). For a given pair, Ptfi and 
Pt,i, the midpoint along the line connecting the correspond¬ 
ing points represents their equal mixture, Pt. a. The target 
preparations P)i,, with the coincidence of the midpoints of 
the three lines illustrating that they satisfy the operational 
equivalence 0 exactly, b. Illustration of how errors in the 
experiment (exaggerated in magnitude) will imply that the 
realized preparations P^^, (termed primary) will deviate from 
the ideal. The lines indicate that not only do these prepara¬ 
tions fail to satify the operational equivalence Q, but since 
the lines do not meet, no mixtures of the P^q and P^^ can be 
found at a single point independent of t. The set of prepara¬ 
tions corresponding to probabilistic mixtures of the P^f, are 
depicted by the grey region, c. Secondary preparations Ptj, 
have been chosen from this grey region, with the coincidence 
of the midpoints of the three lines indicating that the opera¬ 
tional equivalence 0 has been restored. Note that we require 
only that the mixtures of the three pairs of preparations be 
the same, not that they correspond to the completely mixed 
state. 


1 if and only if we can find suitable secondary procedures 
that are close to the ideal versions. 

To test the hypothesis of noncontextuality, one must 
allow for the possibility that the experimental procedures 
do not admit of a quantum model. Nonetheless, for ped¬ 
agogical purposes, we will first provide the details of how 
one would construct the secondary sets under the as¬ 
sumption that all the experimental procedures do admit 
of a quantum model. 

In Fig. 1, we describe the construction of secondary 
preparations in a simplified example of six density oper¬ 
ators that deviate from the ideal states only within the 
x — z plane of the Bloch sphere. 

In practice, the six density operators realized in the 
experiment will not quite lie in a plane. We use the same 
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idea to contend with this, but with one refinement: we 
supplement our set of ideal preparations with two addi¬ 
tional ones, denoted q and ^ corresponding to the 
two eigenstates of a • y. The two procedures that are 
actually realized in the experiment are denoted P^’g and 
Pfi and are considered supplements to the primary set. 
We then search for our six secondary preparations among 
the probabilistic mixtures of this supplemented set of pri¬ 
maries rather than among the probabilistic mixtures of 
the original set. Without this refinement, it can happen 
that one cannot find six secondary preparations that are 
close to the ideal versions, as we explain in Appendix [C| 

The scheme for defining secondary measurement pro¬ 
cedures is also described in Appendix Analogously to 
the case of preparations, one contends with deviations 
from the plane by supplementing the ideal set with the 
observable a ■ y. 

Note that in order to identify which density operators 
have been realized in an experiment, the set of measure¬ 
ments must be complete for state tomography [37]. Sim¬ 
ilarly, to identify which sets of effects have been realized, 
the set of preparations must be complete for measure¬ 
ment tomography [38]. However, the original ideal sets 
fail to be tomographically complete because they are re¬ 
stricted to a plane of the Bloch sphere, and an effective 
way to complete them is to add the observable a ■ y io 
the measurements and its eigenstates to the preparations. 
Therefore, even if we did not already need to supplement 
these ideal sets for the purpose of providing greater lee¬ 
way in the construction of the secondary procedures, we 
would be forced to do so in order to ensure that one can 
achieve tomography. 

The relevant procedure here is not quite state tomog¬ 
raphy in the usual sense, since we want to allow for sys¬ 
tematic errors in the measurements as well as the prepa¬ 
rations. Hence the task [531130] is to find a set of qubit 
density operators, pt^b, and POVMs, {Ex\t}^ that to¬ 
gether make the measured data as likely as possible (we 
cannot expect tT:{pt,bEx\t) to match the measured rela¬ 
tive frequencies exactly due to the finite number of ex¬ 
perimental runs). 

To analyze our data in a manner that does not 
prejudice which model—noncontextual, quantum, or 
otherwise—does justice to it, we must search for rep¬ 
resentations of the preparations and measurements not 
amongst density operators and sets of effects, but rather 
their more abstract counterparts in the formalism of gen¬ 
eralised probabilistic theories [H] ]42] , called generalised 
states and effects. The assumption that the system is 
a qubit is replaced by the strictly weaker assumption 
that three two-outcome measurements are tomographi¬ 
cally complete. (In generalised probabilistic theories, a 
set of measurements are called tomographically complete 
if their statistics suffice to determine the state.) We take 
these states and effects as estimates of our primary prepa¬ 
rations and measurements, and we define our estimate of 
the secondary procedures in terms of these, which in turn 
are used to calculate our estimate for A. We explain how 



State Preparation Measurement 


FIG. 2. The experimental setup. Polarization-separable pho¬ 
ton pairs are created via parametric downconversion, and de¬ 
tection of a photon at Dh heralds the presence of a single 
photon. The polarization state of this photon is prepared 
with a polarizer and two waveplates (prep). A single-mode 
fibre is a spatial filter that decouples beam deflections caused 
by the state-preparation and measurement waveplates from 
the coupling efficiency into the detectors. Three waveplates 
(comp) are set to undo the polarization rotation caused by 
the fibre. Two waveplates (meas), a polarizing beamsplitter, 
and detectors Dr and Dt perform a two-outcome measure¬ 
ment on the state. PPKTP, periodically poled potassium 
titanyl phosphate; PBS, polarizing beamsplitter; GT-PBS, 
Glan-Taylor polarizing beamsplitter; IF, interference filter; 
HWP, half-waveplate; QWP, quarter-waveplate. 

the raw data is fit to a set of generalised states and ef¬ 
fects in Appendix]^ We characterize the quality of this 
fit with a test. 


V. EXPERIMENT 

We use the polarization of single photons to test 
our noncontextuality inequality. The set-up, shown in 
Fig.[^ consists of a heralded single-photon source [43345] . 
polarization-state preparation and polarization measure¬ 
ment. We generate photons using spontaneous paramet¬ 
ric downconversion and prepare eight polarization states 
using a polarizer followed by a quarter-wave plate (QWP) 
and half-wave plate (HWP). The four polarization mea¬ 
surements are performed using a HWP, QWP and polar¬ 
izing beamsplitter. Photons are counted after the beam¬ 
splitter and the counts are taken to be fair samples of the 
true probabilities for obtaining each outcome for every 
preparation-measurement pair. Since the orientations of 
the preparation waveplates lead to small deflections of 
the beam, some information about the preparation gets 
encoded spatially, and similarly the measurement wave¬ 
plates create sensitivity to spatial information; a single¬ 
mode fibre deals with both issues. For a single experi¬ 
mental run we implement each preparation-measurement 
pair for 4s (approximately 10® counts). We performed 
100 such runs. 
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FIG. 3. For every measurement-preparation pair, the proba¬ 
bility of obtaining outcome 0 in the measurement. Red bars 
are relative frequencies calculated from the raw counts, blue 
bars are our estimates of the outcome probabilities of the 
primary measurements on the primary preparations obtained 
from a best-fit of the raw data, and green bars are our es¬ 
timates of the outcome probabilities of the secondary mea¬ 
surements on the secondary preparations. The shaded grey 
background highlights the measurements and preparations for 
which secondary procedures were found. Error bars are not 
visible on this scale, neither are discrepancies between the ob¬ 
tained probabilities and the ideal values thereof, which are at 
most 0.013; statistical error due to Poissonian count statistics 
is at most 0.002. 


Preparations are represented by vectors of raw data 
specifying the relative frequencies of outcomes for each 
measurement, uncertainties on which are calculated as¬ 
suming Poissonian uncertainty in the photon counts. For 
each run, the raw data is fit to a set of states and effects 
in a GPT in which three binary-outcome measurements 
are tomographically complete. This is done using a total 
weighted least-squares method miz]. The average 
over the 100 runs is 3.9 ±0.3, agreeing with the expected 
value of 4, and indicating that the model fits the data 
well. The fit returns a 4 x 8 matrix that serves to dehne 
the 8 GPT states and 4 GPT effects, which are our es¬ 
timates of the primary preparations and measurements. 
The column of this matrix associated to the t, b prepara¬ 
tion, which we denote P^;,, specifies our estimate of the 
probabilities assigned by the primary preparation to 
outcome ‘0’ of each of the primary measurements. The 
raw and primary data are compared in Fig.[^ The prob¬ 
abilities are indistinguishable on this scale. We plot the 
probabilities for Pi, P 2 , and P 3 in Fig.[^ on a much finer 
scale. We then see that the primary data are within error 
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FIG. 4. Operational statistics for raw, primary, and sec¬ 
ondary preparations and measurements, averaged over 100 
experimental runs, a. The probabilities of the primary mea¬ 
surements (blue bars) differ depending on which of the three 
mixed preparations Pf, , and are measured. These 
probabilities are within error of the raw data (red bars), in¬ 
dicating a GPT in which three two-outcome measurements 
are tomographically complete fits the data well. Probabili¬ 
ties for primary measurements on the secondary preparations 
(green bars) are independent of the preparation, hence the 
secondary preparations satisfy Eq. 0 . Note that one ex¬ 
pects these probabilities to deviate from 0.5. In the example 
of Fig. [^, this corresponds to the fact that the intersection 
of the lines is not the completely mixed state, b. Outcome 
probabilities of measurement M* on the eight preparations. 
Red bars are raw data, blue bars are the measurement Mf 
on the primary preparations, and green bars are M* on the 
primary preparations. Regardless of the input state, M* re¬ 
turns outcome 0 with probability 0.5, hence it is operationally 
indistinguishable from a fair-coin flip (Eq. (§). Error bars in 
all plots are calculated assuming Poissonian count statistics. 


of the raw data, as expected given the high quality of the 
fit to the GPT. However, the operational equivalences of 
Eqs. ([^ and Q are not satisfied by our estimates of the 
primary preparations and measurements, illustrating the 
need for secondary procedures. 

We define the six secondary preparations as prob¬ 
abilistic mixtures of the eight primaries: P® j, = 

b'^ where the are the weights 


































































































































































































































































6 



b 



[] 

Noncontextual Models 



A 0.996 0.998 1.000 

FIG. 5. a, Values of the six degrees of correlation in Eq. ([^, 
averaged over 100 experimental runs, b, Average measured 
value for A contrasted with the noncontextual bound A = 
5/6. We find A — 0.99709 ± 0.00007, which violates the non¬ 
contextual bound by 2300 (t. Error bars in both plots represent 
the standard deviation in the average of the measured values 
over the 100 experimental runs. 

in the mixture. We maximize Cp = g ^ 6=0 
over valid subject to the constraint of Eq. Q, that 

is, lEb^lb = ^Eb^lb = lEb^lb (a linear pro¬ 
gram). A high value of Cp ensures each of the six sec¬ 
ondary preparations is close to its corresponding primary. 
Averaging over 100 runs, we hnd Cp = 0.9969 ± 0.0001, 
close to the maximum of 1. An analogous linear pro¬ 
gram to select secondary measurements yields similar re¬ 
sults. Fig. also displays the outcome probabilities for 
the secondary procedures, confirming that they are close 
to ideal. Fig. demonstrates how our construction en¬ 
forces the operational equivalences. 

We analyzed each experimental run separately and 
found the degree of correlation p(A=&|M/, P/j,) for each 
value of t and b. The averages over the 100 runs are shown 
in Fig.[^ and are all in excess of 0.995. Averaging over t 
and b yields an experimental value A = 0.99709±0.00007, 
which violates the noncontextual bound of 5/6 « 0.833 
by 2300cr (Fig.[^). 

VI. DISCUSSION 

Using the techniques described here, it is possible to 
convert proofs of the failure of noncontextuality in quan¬ 
tum theory into experimental tests of noncontextual¬ 
ity that are robust to noise and experimental impreci- 
sions [mUn]. For any phenomenon, therefore, one can 
determine which of its operational features are genuinely 
nonclassical. This is likely to have applications for sci¬ 
entific fields wherein quantum effects are important and 
for developing novel quantum technologies. 

The definition of operational equivalence of prepara¬ 
tions (measurements) required them to be statistically 
equivalent relative to a tomographically complete set of 


measurements (preparations). There are two examples 
of how the assumption of tomographic completeness is 
expected not to hold exactly in our experiment, even if 
one grants the correctness of quantum theory. 

First, our source produces a small multi-photon com¬ 
ponent. We measure the (7^^^(0) of our source [50] to be 
0.0105 ± 0.0001 and from this we estimate the ratio of 
heralded detection events caused by multiple photons to 
those caused by single photons to be 1:4000. Regardless 
of the value of A one presumes for multi-photon events, 
one can infer that the value of A we would have achieved 
had the source been purely single-photon differs from the 
value given above by at most 10“®, a difference that does 
not affect our conclusions. 

We also expect the assumption to not hold exactly be¬ 
cause of the inevitable coupling of the polarization into 
the spatial degree of freedom of the photon, which could 
be caused, for example, by a wedge in a waveplate. In¬ 
deed, we found that if the spatial filter was omitted from 
the experiment, our htting routine returned large val¬ 
ues, which we attributed to the fact that different angles 
of the waveplates led to different deflections of the beam. 

A more abstract worry is that nature might conflict 
with the assumption (and prediction of quantum the¬ 
ory) that three independent binary-outcome measure¬ 
ments are tomographically complete for the polarization 
of a photon. Our experiment has provided evidence in 
favour of the assumption insofar as we have fit data from 
four measurements to a theory where three are tomo¬ 
graphically complete and found a good value for the 
fit. One can imagine accumulating much more evidence 
of this sort, but it is difficult to see how any experi¬ 
ment could conclusively vindicate the assumption, given 
that one can never test all possible measurements. This, 
therefore, represents the most signihcant loophole in ex¬ 
perimental tests of noncontextuality, and new ideas for 
how one might seal it or circumvent it represent the new 
frontier for improving such tests. 
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Appendix A: Elaboration of the notion of 

noncontextuality and the idealizations of previous 
proposals for tests 

In this article, we have used the operational notion of 
noncontextuality proposed in Ref. [M] . According to this 
notion, one can distinguish noncontextuality for measure¬ 
ments and noncontextuality for preparations. To provide 
formal definitions, we must first review the notion of op¬ 
erational equivalence. 

Recall that an operational theory specifies a set of 
physically possible measurements, A4, and a set of phys¬ 
ically possible preparations, V. Each measurement M € 
M. and preparation P G V is assumed to be given as 
a list of instructions of what to do in the laboratory. 
An operational theory also specifies a function p, which 
determines, for every preparation P G V and every mea¬ 
surement M G Ai, the probability distribution over the 
outcome X of the measurement when it is implemented 
on that preparation, p{X\M, P). 

Two measurement procedures, M and M', are said to 
be operationally equivalent if they have the same distri¬ 
bution over outcomes for all preparation procedures, 

p{X\M,P)=p{X\M\P),yPGV (Al) 

Two preparation procedures, P and P', are said to be op¬ 
erationally equivalent if they yield the same distribution 
over outcomes for all measurement procedures, 

p{X\M, P') = p{X\M, P), VM e M (A2) 

Any parameters that can be used to describe differ¬ 
ences between the measurement procedures in a given 
operational equivalence class are considered to be part 
of the measurement context. Similarly, parameters that 
describe differences between preparation procedures in 
a given operational equivalence class are considered to 
be part of the preparation context. This terminological 
convention explains the suitability of the term context- 
independent or noncontextual for an ontological model 
wherein the representation of a given preparation or mea¬ 
surement depends only on the equivalence class to which 
it belongs (as defined below). 

A tomographically complete set of preparation proce¬ 
dures, Vtomo ^ V, is defined as one that is sufficient for 
determining the statistics for any other preparation pro¬ 
cedure, and hence is sufficient for deciding operational 
equivalence of measurements. In other words, one can 
equally well define operational equivalence of measure¬ 
ments M and M' by 

p{X\M, P) = p{X\M', P), VP G Ptomo (A3) 

Similarly, a tomographically complete set of measure¬ 
ment procedures, Altomo C AI, is defined as one that 


is sufficient for determining the statistics for any other 
measurement procedure, and hence is sufficient for de¬ 
ciding operational equivalence of preparations, such that 
we can define operational equivalence of preparations P 
and P' by 

p{X\M, P') = p{X\M, P), VM G Mtomo (A4) 

Note that if the tomographically complete set of prepa¬ 
rations for a given system has infinite cardinality, then it 
is impossible to test operational equivalence experimen¬ 
tally. In quantum theory, the tomographically complete 
set for any finite-dimensional system has finite cardinal¬ 
ity. 

Recall that an ontological model of an operational the¬ 
ory specifies a space A of ontic states, where an ontic 
state is defined as a specification of the values of a set of 
classical variable that mediate the causal influence of the 
preparation on the measurement. An ontological model 
also specifies, for every preparation P G P, a distribution 
p{X\P). The idea is that when the preparation P is im¬ 
plemented on a system, it emerges from the preparation 
device in an ontic state A, where A need not be fixed by 
P but is instead obtained by sampling from the distribu¬ 
tion /r(A|P). Similarly, for every measurement M G AI, 
an ontological model specifies the probabilistic response 
of the measurement to A, specified as a conditional prob¬ 
ability ^(A|M, A) where A is a variable associated to 
the outcome of M. The idea here is that when an ontic 
state A is fed into the measurement M, it need not fix 
the outcome X, but the outcome is sampled from the 
distribution ^(A|M, A). 

The assumption of measurement noncontextuality is 
that measurements that are operationally equivalent 
should be represented by the same conditional proba¬ 
bility distributions in the ontological model, 

p{X\M, P) = p{X\M’, P), VP G P tomo 
^C(A|M,A) = C(A|M',A), VAg A. (A5) 

The assumption of preparation noncontextuality is that 
preparations that are operationally equivalent should be 
represented by the same distributions over ontic states in 
the ontological model 

p(A|M,P) =p{X\M,P'), yM gM tomo 

^ /r(A|P) = /r(A|P'), VA G A. (A6) 

A model is termed simply noncontextual if it is measure¬ 
ment noncontextual and preparation noncontextual. 

We can summarize this as follows. The grounds for 
thinking that two measurement procedures are associated 
with the same observable, and hence that they are rep¬ 
resented equivalently in the noncontextual model, is that 
they give equivalent statistics for all preparation proce¬ 
dures. Similarly, two preparations are represented equiv¬ 
alently in the noncontextual model only if they yield the 
same statistics for all measurements. 

The notion of noncontextuality can be understood as 
a version of Leibniz’s Principle of the Identity of Indis- 
cernables, specifically, the physical identity of operational 
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indiscernables. Other instances of the principle’s use in 
physics include the inference from the lack of superlumi¬ 
nal signals to the lack of superluminal causal influences 
(which justifies Bell’s assumption of local causality [5T]1. 
and Einstein’s inference from the operational indistin- 
guishability of accelerating frames and frames fixed in 
a gravitational field to the physical equivalence of such 
frames. The question of whether nature admits of a non- 
contextual model can be understood as whether it ad¬ 
heres to this version of Leibniz’s principle, at least within 
the framework of ontological models that underlies the 
discussion of noncontextuality. 

It is argued in Ref. [31] that because the principle un¬ 
derlying measurement noncontextuality is the same as 
the one underlying preparation noncontextuality, if one 
assumes the first, then one should also assume the sec¬ 
ond. 

As is shown in Ref. [34] , the traditional notion of non¬ 
contextuality, due to Kochen and Specker |3] , can be un¬ 
derstood as an application of measurement noncontex¬ 
tuality to projective measurements in quantum theory, 
but involves furthermore an additional assumption that 
projective measurements should have a deterministic re¬ 
sponse to the ontic state. 

The idealization of noiseless measurements that we 
highlighted as a problem of previous attempts to provide 
an experimental test of noncontextuality can be equiv¬ 
alently characterized as the idealization of deterministic 
responses of the measurements, as we will now show. 

First, recall that determinism is not an assumption 
of Bell’s theorem. Borrowing an argument from Ein¬ 
stein, Podolsky and Rosen [52], Bell’s 1964 argument [T] 
leveraged a prediction of quantum theory—that if the 
same measurement is implemented on two halves of 
a singlet state, then the outcomes will be perfectly 
anticorrelated—to derive the fact that the local outcome- 
assignments must be deterministic, and from this the first 
Bell inequality. But given that experimental correlations 
are never perfect, no experiment can ever justify deter¬ 
minism. This is why experimentalists use the Clauser- 
Horne-Shimony-Holt inequality [35] to test local causal¬ 
ity. 

In Ref. [31], it was shown that if one makes an as¬ 
sumption of noncontextuality for preparations as well 
as for measurements, then one can also derive the fact 
that projective measurements should respond determin¬ 
istically to the ontic state. The inference relies on cer¬ 
tain predictions of quantum theory, in particular, that 
for every projective rank-1 measurement, there is a ba¬ 
sis of quantum states that makes its outcome perfectly 
predictable [331[31]. However, as in the case of Bell’s orig¬ 
inal inequality, the ideal of perfect predictability is not 
realized in any experiment. In particular, perfect pre¬ 
dictability only holds under the idealization of noiseless 
measurements, which is never achieved in practice. 

It is in this sense that previous proposals for testing 
noncontextuality can be understood as having made an 
unwarranted idealization of noiseless measurements. 


The second idealization that we address in this arti¬ 
cle concerns the impossibility of realizing any two pro¬ 
cedures that satisfy operational equivalence exactly. No 
two experimental procedures ever give precisely the same 
statistics. In formal terms, for any two measurements M 
and M' that one realizes in the laboratory, it is never the 
case that one achieves precise equality in Eq. (A3). Simi¬ 
larly, for any two preparations P and P' that one realizes 
in the laboratory, it is never the case that one achieves 
precisely equality in Eq. (A4|. In both cases, this is due 
to the fact that, in practice, one never quite achieves the 
experimental procedure that one intends to implement. 
The problem for an experimental test of noncontextual¬ 
ity, therefore, is that the conditions for applicability of 
the assumption of noncontextuality (the antecedents in 
the inferences of Eqs. (A5) and (A6)) are, strictly speak¬ 
ing, never satisfied. 


Appendix B: Derivation and tightness of the bound 
in our noncontextuality inequality 

1. Derivation of bound 

In the main text, we only provided an argument for 
why our two applications of the assumption of noncon¬ 
textuality, Eqs. (3) and (5), implied that the quantity A 
must be bounded away from 1. Here we show that the 
explicit value of this bound is |. 

By definition, 

E E P{X = b\MuPt,b). (Bl) 

tG{l,2,3} &G{0.1} 

Substituting for p{X=b\Mt, Pt^b) the expression in terms 
of the distribution p{X\Pt^b) and the response function 
= b\Mt,X) given in Eq. (1), we have 

tG{1.2,3} hG{0,l} AgA 

(B2) 

We now simply note that there is an upper bound on 
each response function that is independent of the value 
of 6, namely, 


ax = b\Mt,X)<piMt,X), 


(B3) 


where 


T]{Mt,X)= max ^(A = 6'|Mt, A). (B4) 

&'G{0,1} 


We therefore have 


E 2 E M(A|Pq6) , 

tG{1.2,3}AGA \ bG{0.1} / 

(B5) 
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Recalling that Pt is an equal mixture of Pt^ and Pt,i, so 
that 

KMPt) = 

we can rewrite the bound as simply 

E (b7) 

tG{r,2,3} AgA 

But recalling Eq. (5) from the main text, 

VA G A : m(A|Pi) = m(A|P2 ) = ^^(AlPa), (B8) 

we see that the distribution fj,{X\Pt) is independent of t, 
so we denote it by ^(A) and rewrite the bound as 


5(a: = o|M3,a) 



^(a = o!a/2,a) 


^(X = 0|Mi,A) 


^^EU E (B9) 

AGA \ tG{l,2.3} / 

This last step is the first use of noncontextuality in the 
proof because Eq. ( |B8[ ) is derived from preparation non¬ 
contextuality and the operational equivalence of Eq. (4). 
It then follows that 


A < max 
AgA 


i 5] rj{MuX) 

tG{1.2.3} 


(BIO) 


Therefore, if we can provide a nontrivial up¬ 
per bound on for an arbitrary on- 

tic state A, we obtain a nontrivial upper bound 
on A. We infer constraints on the possibili¬ 
ties for the triple {r]{Mi,X),rj{M 2 ,X),ri{M^,X)) 
from constraints on the possibilities for the triple 
(^(X=0|Mi, A),^(X=0|M2, A), e(^=0|M3, A)). 

The latter triple is constrained by Eq. (7) from the 
main text, which in the case of AT = 0 reads 

i ^ ^(X=0|M*,A) = J. (Bll) 

tG{l,2.3} 


This is the second use of noncontextuality in our proof, 
because Eq. (Bll) is derived from the operational equiv¬ 
alence of Eq. (2) and the assumption of measurement 
noncontextuality. 

The fact that the range of each response func¬ 
tion is [0,1] implies that the vector (^(X=0|Mi, A), 
^(Ar=0|M2, A),^(A1=0|M3, A)) is constr ained to the 
unit cube. The linear constraint (Bll) implies 
that these vectors are confined to a two-dimensional 
plane. The intersection of the plane and the cube 
defines the polygon depicted in Eig. The six 

vertices of this polygon have coordinates that are 
a permutation of (1,|,0). For every A, the vec¬ 
tor (^(A:= 0 |Mi,A),e(W= 0 |A/ 2 ,A),e(X= 0 |M 3 ,A)) corre¬ 
sponds to a point in the convex hull of these ex¬ 
treme points and given that |X)t^(-^t,A) is a con¬ 
vex function of this vector, it suffices to find a bound 


FIG. 6. The possible values of 

(e(X=0|Mi, A), C(X=0|M2, A), ^(A=0|M3, A)). 


on the value of this function at the extreme points. 
If A is the extreme point (1,^,0), then we have 
(77 (Mi, A),? 7 (M 2 , A),7y(M3, A)) = (1,^,1), and the other 
extreme points are simply permutations thereof. It fol¬ 
lows that 

^J2v{MuX)<l- (B12) 


Substituting this bound into Eq. (BIO), we have our re¬ 
sult. 


2. Tightness of bound: two ontological models 

In this section, we provide an explicit example of a 
noncontextual ontological model that saturates our non¬ 
contextuality inequality, thus proving that the noncon¬ 
textuality inequality is tight, i.e., the upper bound of the 
inequality cannot be reduced any further for a noncon¬ 
textual model. 

We also provide an example of an ontological model 
that is preparation noncontextual but fails to be mea¬ 
surement noncontextual (i.e. it is measurement contex¬ 
tual) and that exceeds the bound of our noncontextuality 
inequality. This makes it clear that preparation non¬ 
contextuality alone does not suffice to justify the precise 
bound in our inequality, the assumption of measurement 
noncontextuality is a necessary ingredient as well. Given 
that we do not believe preparation noncontextuality on 
its own to be a reasonable assumption (as discussed in 
Appendix]^, we highlight this fact only as a clarification 
of which features of the experiment are relevant for the 
particular bound that we obtain. 

Note that there is no point inquiring about the bound 
for models that are measurement noncontextual but 
preparation contextual because, as shown in Ref. [34], 
quantum theory admits of models of this type—the on¬ 
tological model wherein the pure quantum states are the 
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FIG. 7. A noncontextual ontological model that saturates FIG. 8. An ontological model that is preparation noncon- 
the noncontextal bound of onr inequality, exhibiting that the textual but measurement contextual and that violates our 
bound is tight. inequality. 
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5/6 
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1/6 
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Pi 
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1/2 

1/2 

P 2 

1/2 

1/2 

1/2 

1/2 

P 3 

1/2 

1/2 

1/2 

1/2 


TABLE I. Operational statistics from the noncontextual on¬ 
tological model of Fig. achieving A = 5/6. The shaded cells 
correspond to the ones relevant for calculating A. 


ontic states (the '(/'-complete ontological model in the ter¬ 
minology of Ref. [S3]) is of this sort. 

For the two ontological models we present, we begin by 
specifying the ontic state space A. These are depicted in 
Figs. andas pie charts with each slice corresonding to 
a different element of A. We specify the six preparations 
ft.b by the distributions over A that they correspond to, 
denoted fi{X\Pt^b) (middle left of Figs. [^an d§. We spec¬ 
ify the three measurements Mt by the response functions 
for the X = 0 outcome, denoted ^(0|Mt,A) (top right of 
Figs. an d§. Finally, we compute the operational prob¬ 
abilities for the various preparation-measurement pairs, 
using Eq. (1), and display the results in the 6x4 upper- 
left-hand corner of Tables U and IE 

In the remainder of each table, we display the oper¬ 
ational probabilities for the effective preparations, Pt, 
which are computed from the operational probabilities 
for the Pt,b and the fact that Pt is the uniform mixture 
of Ptfl and Pt^i- We also display the operational proba¬ 


TABLE II. Operational statistics from the preparation non¬ 
contextual and measurement contextual ontological model of 
Fig. achieving A — 9/10. The shaded cells correspond to 
the ones relevant for calculating A. 

bilities for the effective measurement M*, which is com¬ 
puted from the operational probabilities for the Mt and 
the fact that M* is a uniform mixture of Mi, M 2 and 
M3. 

From the tables, we can verify that our two ontolog¬ 
ical models imply the operational equivalences that we 
use in the derivation of our noncontextuality inequality. 
Specifically, the three preparations Pi, P2 and P3 yield 
exactly the same statistics for all of the measurements, 
and the measurement M* is indistinguishable from a fair 
coin flip for all the preparations. 

Figs. and also depict fj,{X\Pt) for t G {1,2,3} for 
each model (bottom left). These are determined from the 
^J-iX\Pt,b) via Eq. (|B6). Similarly, the response function 
^(0|M*,A), which is determined from ^{X = 6|M*,A) = 
|^jg{i 2 3 }^(X = b\Mt,X), is displayed in each case 
(bottom right). 

Given the operational equivalence of Pi, P2 and P3, 
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an ontological model is preparation noncontextual if and 
only if ^(A|Pi) = ^(A|P 2 ) = for all A € A. We 

see, therefore, that both models are preparation noncon¬ 
textual. 

Similarly given the operational equivalence of Af* and 
a fair coin flip, an ontological model is measurement non¬ 
contextual if and only if ^(0|M*, A) = ^ for all A G A. We 
see, therefore, that only the first model is measurement 
noncontextual. 

Note that in the second model, M* manages to be op¬ 
erationally equivalent to a fair coin flip, despite the fact 
that when one conditions on a given ontic state A, it 
does not have a uniformly random response. This is pos¬ 
sible only because the set of distributions is restricted in 
scope, and the overlaps of these distributions with the re¬ 
sponse functions always generates the uniformly random 
outcome. This highlights how an ontological model can 
do justice to the operational probabilities while failing to 
be noncontextual. 

Finally, using the operational probabilities in the ta¬ 
bles, one can compute the value of A for each model. It 
is determined entirely by the operational probabilities in 
the shaded cells. One thereby confirms that A = | in 
the first model, while A = ^ in the second model. 


Appendix C: Constructing the secondary procedures 
from the primary ones 

1. Secondary preparations in quantum theory 

As noted in the main text, it is easiest to describe the 
details of our procedure for defining secondary prepara¬ 
tions if we make the assumption that quantum theory 
correctly describes the experiment. Further on, we will 
describe the procedure for a generalised probabilistic the¬ 
ory (GPT). 

Fig. 1 in the main text described how to define the sec¬ 
ondary preparations if the primary preparations deviate 
from the ideal only within the x — z plane of the Bloch 
sphere. Here, we consider the case where the six primary 
preparations deviate from the ideals within the bulk of 
the Bloch sphere. The fact that our proof only requires 
that the secondary preparations satisfy Eq. (10) means 
that the different pairs, and for t G {1,2,3}, 
need not all mix to the center of the Bloch sphere, but 
only to the same state. It follows that the three pairs 
need not be coplanar in the Bloch sphere. Note, how¬ 
ever, for any two values, t and F, the four preparations 
PIq, Pj 1 , Pj, q, P({i do need to be coplanar. 

Any mixing procedure defines a map from each of 
the primary preparations P^^^ to the corresponding sec¬ 
ondary preparation P|f,, which can be visualized as a mo¬ 
tion of the corresponding point within the Bloch sphere. 
To ensure that the six secondary preparations approxi¬ 
mate well the ideal preparations while also defining mixed 
preparations P}, P| and P| that satisfy the appropriate 
operational equivalences, the mixing procedure must al¬ 


low for motion in the ±y direction. Consider what hap¬ 
pens if one tries to achieve such motion without supple¬ 
menting the primary set with the eigenstates oi a ■ y. A 
given point that is biased towards —y can be moved in 
the +y direction by mixing it with another point that has 
less bias in the —y direction. However, because the pri¬ 
mary preparations are widely separated within the x — z 
plane, achieving a small motion in -\-y direction in this 
fashion comes at the price of a large motion within the 
X — z plane, implying a significant motion away from the 
ideal. This problem is particularly pronounced if the pri¬ 
mary points are very close to coplanar. 

The best way to move a given point in the ±y di¬ 
rection is to mix it with a point that is at roughly the 
same location within the x — z plane, but displaced in the 
±y direction. This scheme, however, would require sup¬ 
plementing the primary set with one or two additional 
preparations for every one of its elements. Supplement¬ 
ing the original set with just the two eigenstates oi a ■ ij 
constitutes a good compromise between keeping the num¬ 
ber of preparations low and ensuring that the secondary 
preparations are close to the ideal. Because the a ■ y 
eigenstates have the greatest possible distance from the 
X — z plane, they can be used to move any point close 
to that plane in the ±y direction while generating only a 
modest motion within the x — z plane. 


2. Secondary measurements in quantum theory 

Just as with the case of preparations, we solve the 
problem of no strict statistical equivalences for measure¬ 
ments by noting that from the primary set of measure¬ 
ments, Mf, and Mg, one can infer the statistics of 
a large family of measurements, and one can find three 
measurements within this family, called the secondary 
measurements and denoted M}, M| and M|, such that 
their mixture, M|, satisfies the operational equivalence 
of Eq. (2) exactly. To give the details of our approach, it 
is again useful to begin with the quantum description. 

A geometric visualization of the construction is also 
possible in this case. Just as a density operator can be 
written p = + r ■ a) to define a three-dimensional 

Bloch vector r, an effect can be written E = ^ (eol -|- e • 
a) to define a four-dimensional Bloch-like vector {eo,e), 
whose four components we will call the I, x, y and z 
components. Note that cq = tr(E), while e^ = tr(d ■ 
xE) and so forth. The eigenvalues of E are expressed in 
terms of these components as i(eo ± |e|). Consequently, 
the constraint that 0 < E < 1 takes the form of three 
inequalities 0 < Co < 2, |e| < eg and |ej < 2 — eg. This 
corresponds to the intersection of two cones. For the 
case By = 0, the Bloch representation of the effect space 
is three-dimensional and is displayed in Fig. When 
portraying binary-outcome measurements associated to 
a POVM {E, I — E} in this representation, it is sufficient 
to portray the Bloch-like vector (eg,e) for outcome E 
alone, given that the vector for I — E is simply (2 — 
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Co,—e). Similarly, to describe any mixture of two such 
POVMs, it is sufficient to describe the mixture of the 
effects corresponding to the first outcome. 

The family of measurements that is defined in terms 
of the primary set is slightly different than what we had 
for preparations. The reason is that each primary mea¬ 
surement on its own generates a family of measurements 
by probabilistic post-processing of its outcome. If we de¬ 
note the outcome of the original measurement by X and 
that of the processed measurement by X\ then the prob¬ 
abilistic processing is a conditional probability p{X'\X). 
It is sufficient to determine the convexly-extremal post¬ 
processings, since all others can be obtained from these 
by mixing. For the case of binary outcome measure¬ 
ments considered here, there are just four extremal post¬ 
processings: the identity process, p{X'\X) = 5x',x] the 
process that flips the outcome, p{X'\X) = 5x',xm') the 
process that always generates the outcome X' = 0, 
p{X'\X) = Sx'p] and the process that always gener¬ 
ates the outcome X’ = 1, p{X'\X) = Sx'.i- Apply¬ 
ing these to our three primary measurements, we ob¬ 
tain eight measurements in all: the two that generate a 
fixed outcome, the three originals, and the three origi¬ 
nals with the outcome flipped. If the set of primary mea¬ 
surements corresponded to the ideal set, then the eight 
extremal post-processings would correspond to the ob¬ 
servables 0 , 1 , (T • til, —a ■ ni,(T ■ 712, ■ ^ 2 , (7 • 713 , • 773 . 

In practice, the last six measurements will be unsharp. 
These eight measurements can then be mixed probabilis¬ 
tically to define the family of measurements from which 
the secondary measurements must be chosen. We refer 
to this family as the convex hull of the post-processings 
of the primary set. 

We will again start with a simplified example, wherein 
the primary measurements have Bloch-like vectors with 
vanishing component along y, Cy = 0 , and unit compo¬ 
nent along I, cq = 1, so that E = e^S ■ x -\- Czii ■ z). 
In this case, the constraint 0 < £1 < I reduces to |e| < 1, 
which is the same constraint that applies to density op¬ 
erators confined to the x — z plane of the Bloch sphere. 
Here, the only deviation from the ideal is within this 
plane, and the construction is precisely analogous to what 
is depicted in Fig. 1 of the main text. 

Unlike the case of preparations, however, the primary 
measurements can deviate from the ideal in the I direc¬ 
tion, that is, E may have a component along I that de¬ 
viates from 1 , which corresponds to introducing a state- 
independent bias on the outcome of the measurement. 
This is where the extremal post-processings yielding the 
constant-outcome measurements corresponding to the 
observables 0 and I come in. They allow one to move 
in the ±1 direction. 

Fig-E presents an example wherein the primary mea¬ 
surements have Bloch-like vectors that deviate from the 
ideal not only within the x—z plane, but in the I direction 
as well (it is still presumed, however, that all components 
in the y direction are vanishing). 

In practice, of course, the y component of our mea- 



FIG. 9. A depiction of the construction of secondary mea¬ 
surements from primary ones in the simplified case where the 
component along y is zero. For each measurement, we spec¬ 
ify the point corresponding to the Bloch representation of its 
first outcome. These are labelled [0|Mi], [OIM 2 ] and [OIM 3 ]. 
The equal mixture of these three, labelled [0|M*], is the cen¬ 
troid of these three points, i.e. the point equidistant from all 
three, a, The ideal measurements [0|Mt] with centroid at 1/2, 
illustrating that the operational equivalence (2) is satisfied ex¬ 
actly. b. Errors in the experiment (exaggerated) will imply 
that the realized measurements [0|Mf ] (termed primary) will 
deviate from the ideal, and their centroid deviates from 1/2. 
The family of points corresponding to probabilistic mixtures 
of the [OlMf] and the observables 0 and I are depicted by the 
grey region. (For clarity, we have not depicted the outcome- 
flipped versions of the three primary measurements, and have 
not included them in the probabilistic mixtures. As we note 
in the text, such a restriction still allows for a good construc¬ 
tion.) c, The secondary measurements M/ that have been 
chosen from this grey region. They are chosen such that their 
centroid is at 1/2, restoring the operational equivalence (2). 


surements never vanishes precisely either. We therefore 
apply the same trick as we did for the preparations. We 
supplement the set of primary measurements with an 
additional measurement, denoted Mf, that ideally cor¬ 
responds to the observable a ■ y. The post-processing 
which flips the outcome then corresponds to the observ¬ 
able —a ■ y. Mixing the primary measurements with 
and its outcome-flipped counterpart allows motion in the 
±y direction within the Bloch cone. 

Note that the capacity to move in both the y and 
the —y direction is critical for achieving the operational 
equivalence of Eq. (2), because if the secondary measure- 
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merits had a common bias in the y direction, they could 
not mix to the POVM {1/2,1/2} as Eq. (9) requires. For 
the preparations, by contrast, supplementing the primary 
set by just one of the eigenstates of a -y would still work, 
given that the mixed preparations P/ do not need to co¬ 
incide with the completely mixed state 1 / 2 . 

The secondary measurements M}, M| and M| are then 
chosen from the convex hull of the post-processings of 
the Mf, M 2 , Mg , M 4 . Without this supplementation, it 
may be impossible to find secondary measurements that 
define an M/ that satisfies the operational equivalences 
while providing a good approximation to the ideal mea¬ 
surements. 

In all, under the extremal post-processings of the 
supplemented set of primary measurements, we obtain 
ten points which ideally correspond to the observables 
0 ,1, (T • rfi, —a • ni, (T • 122 , —3 ■ n 2 , 3 ■ —3 ■ n^,3 ■ y, and 

-3 ■ y. 

Note that the outcome-flipped versions of the three 
primary measurements are not critical for defining a good 
set of secondary measurements, and indeed we find that 
we can dispense with them and still obtain good results. 
This is illustrated in the example of Fig. 


3. Secondary preparations and measurements in 
generalised probabilistic theories 

We do not want to presuppose that our experiment 
is well fit by a quantum description. Therefore instead 
of working with density operators and POVMs, we work 
with GPT states and effects, which are inferred from the 
matrix 


which is associated to the primary preparation P/’j,, by 

pp 

^ t^b' 

As described in the main text, we define the secondary 
preparation P/j, by a probabilistic mixture of the pri¬ 
mary preparations. Thus, the GPT state of the sec¬ 
ondary preparation is a vector P®that is a probabilistic 
mixture of the P/(,, 




t' = l b'=0 


(G3) 


where the u\’\, are the weights in the mixture. 

A secondary measurement M/, is obtained from the 
primary measurements in a similar fashion, but in ad¬ 
dition to probabilistic mixtures, one must allow certain 
post-processings of the measurements, in analogy to the 
quantum case described above. 

The set of all post-processings of the primary outcome- 
0 measurement events has extremal elements consist¬ 
ing of the outcome -0 measurement events themselves to¬ 
gether with: the measurement event that always occurs 
(i.e. obtaining outcome ‘ 0 ’ or ‘ 1 ’), which is represented 
by the vector of probabilities where every entry is 1 , de¬ 
noted 1 ; the measurement event that never occurs (i.e. 
obtaining neither outcome ‘ 0 ’ nor outcome ‘ 1 ’), which 
is represented by the vector of probabilities where every 
entry is 0 , denoted 0; and the outcome -1 measurement 
events, [lIM^], which is represented by the vector 1 —M/. 

We can therefore define our three secondary outcome-0 
measurement events as probabilistic mixtures of the four 
primary ones as well as the extremal post-processings 
mentioned above, that is 


/pi,o Pi,i ■■■ pI,o pI.A 

Pi,o Pi,! ■■■ pI,o pI,i 

Pi,o Pi,i ■ ■ ■ pI.o P4.1 

4 4 4 4 

\Pi,o Pi,i ■■■ P4.0 Ph/ 


where 




(Cl) 


(C2) 


is the probability of obtaining outcome 0 in the fth mea¬ 
surement that was actually realized in the experiment 
(recall that we term this measurement primary and de¬ 
note it by Mf,), when it follows the {t,b)th preparation 
that was actually realized in the experiment (recall that 
we term this preparation primary and denote it by PA)- 
These probabilities are estimated by fitting the raw ex¬ 
perimental data (which are merely finite samples of the 
true probabilities) to a GPT; we postpone the description 
of this procedure to Sec. |D 1| 

The rows of the DP matrix define the GPT effects. We 
denote the vector defined by the tth row, which is asso¬ 
ciated to the measurement event [0|M/’] (obtaining the 0 
outcome in the primary measurement Mf), by M/. Sim¬ 
ilarly, the columns of this matrix define the GPT states. 
We denote the vector associated to the (t, 6 )th column, 


4 4 

M? = ^ uV(l-MP,), (C4) 

t'=i t"=i 

where for each t, the vector of weights in the mixture is 
(n{, U 2 , Ug, U4, n}, u/, 4 , u /,27 A 3 : ^4)- We see that this 

is a particular type of linear transformation on the rows. 

Again, as mentioned in the discussion of the quantum 
case, we can in fact limit the post-processing to exclude 
the outcome-1 measurement events for Mg, M2 and Mg, 
keeping only the outcome-1 event for M4, and still obtain 
good results. Thus we found it sufficient to search for 
secondary outcome -0 measurement events among those 
of the form 

4 

= AM/, + Ao + vll + ^4(1 - M^), (G 5 ) 

where for each t, the vector of weights in the mixture is 

(A: A: A: A: A: A: ^ 4 )- 

Returning to the preparations, we choose the weights 
MjJA to maximize the function 


Cp = 






6 ^ “t.h 
t=l 6=0 


(G6) 
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subject to the linear constraint 


Appendix D: Data analysis 




(C7) 


1. Fitting the raw data to a generalised 
probabilistic theory 


as noted in the main text. This optimization ensures 
that the secondary preparations are as close as possi¬ 
ble to the primary ones while ensuring that they satisfy 
the relevant operational equivalence exactly. Table |III| 
reports the weights u]’^, that were obtained from this 
optimization procedure, averaged over the 100 runs of 
the experiment. As noted in the main text, these weights 
yield Cp = 0.9969±0.0001, indicating that the secondary 
preparations are indeed very close to the primary ones. 

The scheme for finding the weights 
{v\ , ^2 , Ug , , Uq , Uj , u!,4) that define the secondary 

measurements is analogous. Using a linear program, 
we find the vector of such weights that maximizes the 
function 


= (C8) 

subject to the constraint that 

Ml = il, (C9) 


where M® = | ^ value of Cm signals that 

each of the three secondary measurements is close to the 
corresponding primary one. Table IV reports the weights 
we obtain from this optimization procedure, averaged 
over the 100 runs of the experiment. These weights yield 
Cm = 0.9976 ± 0.0001, again indicating the closeness of 
the secondary measurements to the primary ones. 

This optimization defines the precise linear transfor¬ 
mation of the rows of and the linear transformation 
of the columns of that serve to define the secondary 
preparations and measurements. By combining the op¬ 
erations on the rows and on the columns, we obtain from 
DP a 3 X 6 matrix, denoted D®, whose entries s* ^ are 


In our experiment we perform four measurements on 
each of eight input states. If we define rj ^ as the fraction 
of ‘ 0 ’ outcomes returned by measurement Mt> on prepa¬ 
ration Pt^b, the results can be summarized in a 4 x 8 
matrix of raw data, , defined as: 


D'^ = 


1.0 ' 1,1 

p2 2 

1.0 ' 1,1 

,,3 „3 

1.0 ' 1,1 

4 4 

u.o n,i 


'4,0 

C ,0 

C ,0 

'4,0 




4,1 

rp'i 

7^.4 


(Dl) 


,i/ 


Each row of corresponds to a measurement, ordered 
from top to bottom as Mg, M 2 , Mg, and M 4 . Similary, 
the columns are labelled from left to right as Pi,o, Pi 4 , 
^2,0. ^2,1) ^3,0) U3_i,P4 0 , and P4,l. 

In order to test the assumption that three independent 
binary-outcome measurements are tomographically com¬ 
plete for our system, we fit the raw data to a matrix, 
PP, of primary data defined in Eq. (Cl). contains 
the outcome probabilities of four measurements on eight 
states in the GPT-of-best-fit to the raw data. We fit to 
a GPT in which three 2-outcome measurements are to¬ 
mographically complete, which we characterize with the 
following result. 


Proposition 1 A matrix can arise from a GPT in 
which three two-outcome measurements are tomographi¬ 
cally complete if and (with a measure zero set of excep¬ 
tions) only if ap) -|- bpf j, -|- cp) -|- dp) — 1 = 0 for some 
real constants {a, 6 , c, d}. 

Proof. We begin with the “only if” part. Follow¬ 
ing [m |42], if a set of two-outcome measurements 
Ma, Mb, Me (called fiducial measurements) are tomo¬ 
graphically complete for a system, then the state of the 
system given a preparation P can be specified by the 
vector 


r=l j3—0 


t.b 


4 

E^ 

r' = l 


'Pt.B + Uq 0 -I- U4 1 -I- U^,4(l 



(CIO) 

where t',t G {1,2,3}, b G {0,1}. This matrix describes 
the secondary preparations Pf^^ and measurements Mf. 

The component s^ j, of this matrix describes the prob¬ 
ability of obtaining outcome 0 in measurement Mf on 
preparation Pfi,, that is. 


sl[,=p{0\Mf,Pl,). (Cll) 


These probabilities are the ones that are used to calculate 
the value of A via Eq. ( 6 ) of the main text. 


p{0\Ma,P)\ _) 

P- p(0|Mb,P) 

\p{0\Mc,P)J 

(where the first entry indicates that the state is normal¬ 
ized). In [m 135] it is shown that convexity then re¬ 
quires that the probability of outcome ‘ 0 ’ for any mea¬ 
surement M is given by r • p for some vector r. Let 
ri,r 2 ,rg,r 4 correspond to outcome ‘ 0 ’ of the measure¬ 
ments Ml, M 2 , Mg, M 4 , and note that the measurement 
event that always occurs, regardless of the preparation 
(e.g. the event of obtaining either outcome ‘ 0 ’ or ‘ 1 ’ in 
any binary-outcome measurement), must be represented 
by ri = (1,0,0,0). Since the ri, r 2 , rg, r 4 , ri are a set of 
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pio pr.i 


Plo 


pp 

-^ 2,1 


Plo 


PL 


Plo 


Ph 


B.99483 

0.00002 

0.00065 

0.00134 

0.00008 

0.00011 


PL 

PL 

Pio 

PL 

ps 

-^3,0 

PL 


O. 00023 

P. 9979H 

0.00008 

0.00015 

0.00023 

0.00023 


0.00029 

0.00014 

i9968| 

0.00009 

0.00011 

0.00022 


0.00092 

0.00026 

0.00003 

1.99481 

0.00000 

0.00016 


0.00016 

0.00006 

0.00001 

0.00008 

I.9988I 

0.00016 


0.00031 

0.00005 

0.00029 

0.00028 

O. 00004 

P. 9980a| 


0.00324 

0.00154 

0.00002 

0.00000 

0.00044 

0.00050 


0.00003 

0.00002 

0.00208 

0.00323 

0.00027 

0.00061 


TABLE III. Each of the six secondary preparation procedures, denoted where t € {1,2, 3}, 6 € {0,1} (the rows), is 
a probabilistic mixture of the eight primary preparation procedures, denoted P^i i,i where t' € { 1 , 2 ,3,4}, 6 ' G { 0 , 1 } (the 
columns). The table presents the weights appearing in each such mixture, denoted in the main text. These are determined 

numerically by maximizing the function Cp = | L’t average of the weights appearing in the shaded cells), which 

quantifies the closeness of the secondary procedures to the primary ones, subject to the constraint of operational equivalence 
of the uniform mixtures of Pf g S'Hd P^i for t £ {1,2, 3}. The values presented are averages over 100 runs. 



[0\MI\ [0\M^\ [OIMj'J [0\MI\ [1|M^J 1 0 

[ 0 |M}J 

[0|M|] 

[0|M|] 

1 0.997071 0.00004 0.00015 0.00010 0.00208 0.00031 0.00025 

0.00007 B-997271 0.00012 0.00004 0.00199 0.00028 0.00023 
0.00004 0.00002 p.9984a| 0.00001 0.00117 0.00019 0.00012 


TABLE IV. Each of the three secondary outcome-0 measurement events, denoted [0|M}] where t € {1,2,3} (the rows), is 
a probabilistic mixture of the four primary outcome-0 measurement events, denoted [0|M})] where t' £ {1,2, 3,4}, and three 
processings thereof, denoted [IIM 4 ], 1, and 0 (the seven columns). The table presents the weights appearing in each such 
mixture. These are determined numerically by maximizing the function Cm = | L (the average of the weights appearing 

in the shaded cells), which quantifies the closeness of the secondary procedures to the primary ones, subject to the constraint of 
operational equivalence between the uniform mixture of Mf, Mf and Ml and a fair coin flip. The values presented are averages 
over 100 runs. 


five four-dimensional vectors, they must be linearly de¬ 
pendent: 

a'ri -I- b'r2 + c'v^ + + e'ri = 0 (D3) 


with {alb' L ,d' ,e') ^ (0,0, 0,0, 0). The set of r for 
which e' must be zero are those where rj is not in the span 
of ri, r 2 , r 3 , r 4 , which is a set of measure zero. Hence we 
can generically ensure e' 7 ^ 0 and divide Eq. (D3) through 
by —e' to obtain 


ari -I- br2 + era -f dr 4 - ri = 0 (D4) 


where a = —a'je', b = —b'je! and so on. 

Finally, letting P( t, denote the column vector of the 
form of Eq. (D2) that is associated to the preparation 
Pt.hi and noting that by definition 


Pib = i"t' • Pt,b, (D5) 

we see that by taking the dot product of Eq. |D4| with 
each pt,b-, we obtain the desired constraint on Dp. 

For the “if” part, we assume the constraint and demon¬ 
strate that there exists a triple of binary-outcome mea¬ 
surements, Ma, Mb, and Me, that are tomographically 
complete for the GPT. To establish this, it is sufficient 
to take the fiducial set, Ma, Mb and Me, to be Mi, 
M 2 , and M 3 , so that preparation Pt^b corresponds to the 
vector 


Pt,b = 


IL 

Pt,b 

Plb 

VI/ 


(D 6 ) 


In this case, we can recover D'^ if Mi, M 2 , and M 3 
are represented by ri = ( 0 , 1 , 0 , 0 ), r 2 = ( 0 , 0 , 1 , 0 ) and 
r 3 = ( 0 , 0 , 0 , 1 ), whilst the assumed constraint implies 
that r 4 = —(— 1 , a, &, c)/d. ■ 

Geometrically, the proposition dictates that the eight 
columns of HP lie on the 3-dimensional hyperplane de¬ 
fined by the constants {o, 6 , c, d}. 

To find the GPT-of-best-fit we fit a 3-d hyperplane to 
the eight 4-dimensional points that make up the columns 
of D". We then map each column of H'' to its closest point 
on the hyperplane, and these eight points will make up 
the columns of HP. We use a weighted total least-squares 
procedure [461147] to perform this fit. Each element of 
H’’ has an uncertainty, ArJ j,, which is estimated assum¬ 
ing the dominant source of error is the statistical error 
arising from Poissonian counting statistics. We define 
the weighted distance, Xt,b, between the {t, b) column of 

H" and HP as Xt,b = sjuLi (h*!;, “<&) / (^<b) ■ 

Finding the best-fitting hyperplane can be summarized 
as the following minimization problem: 


4 1 

minimize = xl,b > 

{pj j,a,b.c,d} 

subject to ap\ ^, -b bp^j, + cpl^, -b dpf i, -1 = 0 
Vt = l,...,4,& = 0,l. 

(D7) 

The optimization problem as currently phrased is a 
problem in 36 variables—the 32 elements of HP together 
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with the hyperplane parameters {o, b, c, d}. We can sim¬ 
plify this by first solving the simpler problem of finding 
the weighted distance Xt,b between the {t,b) column of 
-D’’ and the hyperplane {a,b,c,d}. This can be phrased 
as the following 8 -variable optimization problem: 




minimize 

ib’Plb’Plb’Pt.b 


xt,b = 



subject to apl ^ + bp'^ ^ + cp^ ^ + dp"^ ^ - 1 = 0. 

(D 8 ) 

Using the method of Lagrange multipliers |3nj , we define 
the Lagrange function L = Xt,b + + ^Pt,b + ‘^Pt.b + 

dpf — 1), where 7 denotes the Lagrange multiplier, then 
simultaneously solve 


dr 

dj 


= 0 


(D9) 


and 


5r 

dPtb 


= 0 , = 


(DIO) 


for the variables 7 , pfj^, and p^j,. 

the solutions for p)f,, p^ j,, p^ ^ and p^ j, into 
find 


Substituting 
Eq. (D 8 ) we 


2 _ -f brl^ + crh^ + drh^ - 1 )^ 

“ (aA,7)“ + (:,Ar.7)U (oAr7)U (dA7,)'’ 

(Dll) 

which now only contains the variables a, 6 , c, and d. 

The hyperplane-finding problem can now be stated as 
the following four-variable optimization problem: 


minimize 

{a,6,c,d} 


4 1 


X 




(D12) 


which we solve numerically. 

The parameter returned by the fitting procedure is 
a measure of the goodness-of-fit of the hyperplane to the 
data. Since we are fitting eight datapoints to a hyper¬ 
plane defined by four fitting parameters { 0 , 6 , c,d}, we 
expect the x^ parameter to be drawn from a x^ distribu¬ 
tion with four degrees of freedom m, which has a mean 
of 4. As stated in the main text, we ran our experiment 
100 times and obtained 100 independent parameters; 
these have a mean of 3.9 ±0.3. In addition we performed 
a more stringent test of the fit of the model to the data by 
summing the counts from all 100 experimental runs be¬ 
fore performing a single fit. This fit returns a of 4.33, 
which has a p-value of 36%. The outcomes of these tests 
are consistent with our assumption that the raw data 
can be explained by a GPT in which three 2-outcome 
measurements are tomographically complete and which 
also exhibits Poissonian counting statistics. Had the fit¬ 
ting procedure returned values that were much higher. 


this would have indicated that the theoretical description 
of the preparation and measurement procedures required 
more than three degrees of freedom. On the other hand, 
had the fitting returned an average y^ much lower than 
4, this would have indicated that we had overestimated 
the amount of uncertainty in our data. 

After finding the hyperplane-of-best-fit {a, 6 , c, d}, we 
find the points on the hyperplane that are closest to each 
column of D'^. This is done by numerically solving for 
Ptb^ Ptb^ Ptb^ Ptb value of {t,b). 

The point on the hyperplane closest to the (t, b) column 
of D'^ becomes the (t, 6 ) column of D^. The matrix 
is then used to find the secondary preparations and mea¬ 
surements. 


2. Why is fitting to a GPT necessary? 

It is clear that one needs to assume that the mea¬ 
surements one has performed form a tomographically 
complete set, otherwise statistical equivalence relative to 
those measurements does not imply statistical equiva¬ 
lence relative to all measurements. (Recall that the as¬ 
sumption of preparation noncontextuality only has non¬ 
trivial consequences when two preparations are statisti¬ 
cally equivalent for all measurements.) 

The minimal assumption for our experiment would 
therefore be that the four measurements we perform are 
tomographically complete. But our physical understand¬ 
ing of the experiment leads us to a stronger assumption, 
that three measurements are tomographically complete. 
Here we clarify why, given this latter assumption, it is 
necessary to carry out the step of fitting to an appropri¬ 
ate GPT. 

It is again easier to begin by considering the case 
that our experiment is described by quantum theory. 
Let (?(f), f), 9 ^f,) denote the probability of obtain¬ 

ing outcome ‘0’ in measurements Mi, M 2 , M 3 , M 4 on 
preparation Pt^b, according to quantum theory, namely 
ql i, = Tr(£'ipt_b), where Ei is the POVM element corre¬ 
sponding the the 0 outcome of measurement M^ and j, 
is the density operator for Pt^t- 

Let us represent pt,b = t? ■ ftqh by a Bloch vector ut.b 
and the elements + a ■ Vi by a “Bloch four- 

vector” {v°,Vi). Then + Ut^b • Vi- Since the 

Vi lie in a unit sphere, the {qlb, qt b’It b^ df b) 
image of the sphere under the affine transformation u 1 —> 
(ui, ^ 2 ; v2) + {vi-u, V 2 -U, vs’U, Vi’u), i.e. some ellipsoid, 

a three-dimensional shape in a four-dimensional space. 

However, the relative frequencies we observe will fluc¬ 
tuate from ql ^ in all four dimensions. Fluctuations in 
the three dimensions spanned by the “Bloch ellipsoid” 
can be accomodated by using secondary preparations as 
described above. But fluctuations in the fourth direction 
are, according to quantum theory, always statistical and 
never systematic, and by the same token we cannot de¬ 
liberately produce supplementary preparations that have 
any bias in this fourth direction. Therefore, we need to 
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deal with these fluctuations in a different way. If one was 
assuming quantum theory, one would simply fit relative 
frequencies to the closest points q\ in the Bloch ellip¬ 
soid, just as one usually fits to the closest valid density 
operator. 

Since we do not assume quantum theory, we do not 
assume that the states lie in an ellipsoid. However, 
we still make the assumption that three two-outcome 
measurements are tomographically complete. Hence, by 
Proposition the long-run probabilities lie in a three- 
dimensional subspace of a four-dimensional space, and 
so there are no supplementary preparations that can deal 
with fluctuations of relative frequencies in the fourth di¬ 
mension. Instead of fitting to the “Bloch ellipsoid”, we 
fit to a suitable GPT. 


3. Analysis of statistical errors 

Because the relative frequencies derived from the raw 
data constitute a finite sample of the true probabilities 
(i.e. the long-run relative frequencies), the GPT states 
and effects that yield the best fit to the raw data are 
estimates of the GPT states and effects that characterize 
the primary preparations and measurements. 

It is these estimates that we input into the linear pro¬ 
gram that identifies the weights with which the primary 
procedures must be mixed to yield secondary procedures. 
As such, our linear program outputs estimates of the true 
weights, and therefore when we use these weights to mix 
our estimates of the GPT states and effects that char¬ 
acterize the primary preparations and measurements, we 
obtain estimates of the GPT states and effects that char¬ 
acterize the secondary preparations and measurements. 
In turn, these estimates are input into the expression for 
A and yield an estimate of the value of A for the sec¬ 
ondary preparations and measurements. 

To determine the statistical error on our estimate of A, 
we must quantify the statistical error on our estimates of 
the GPT states for the primary preparations and on our 
estimates of the GPT effects for the primary measure¬ 
ments. We do so by taking our experimental data in 100 
distinct runs, each of which yields one such estimate. For 
each of these, we follow the algorithm for computing the 
value of A. In this way, we obtain 100 samples of the 
value of A for the secondary procedures, and these are 
used to determine the statistical error on our estimate 
for A. 

Note that a different approach would be to presume 
some statistical noise model for our experiment, then in¬ 
put the observed relative frequencies (averaged over the 
entire experiment) into a program that adds noise us¬ 
ing standard Monte Carlo techniques. Though one could 
generate a greater number of samples of A in this way, 
such an approach would be worse than the one we have 
adopted because the error analysis would be only as re¬ 
liable as one’s assumptions regarding the nature of the 
noise. 


Given that the quantity A we obtain is 2300(T above 
the noncontextual bound, we can conclude that there is 
a very low likelihood that a noncontextual model would 
provide a better fit to the true probabilities than the 
GPT that best fit our finite sample would. This is the 
sense in which our experiment rules out a noncontextual 
model with high confidence. 

It should be noted that this sort of analysis of statisti¬ 
cal errors is no different from that used for experimental 
tests of Bell inequalities. The Bell quantity (the expres¬ 
sion that is bounded in a Bell inequality) is defined in 
terms of the true probabilities. Any Bell experiment, 
however, only gathers a finite sample of these true prob¬ 
abilities. From this sample, one estimates the true prob¬ 
abilities and in turn the value of the Bell quantity. We 
treat the quantity A appearing in our noncontextuality 
inequality in a manner precisely analogous to the Bell 
quantity. The definition of A in terms of the true prob¬ 
abilities is admittedly more complicated than for a Bell 
quantity: we define secondary procedures based on an 
optimization problem that takes as input the true prob¬ 
abilities for the primary procedures, and use the true 
probabilites for the secondary procedures to define A. 
But this complication does not change the fact that A is 
ultimately just a function of the true probabilities for the 
primary preparations and measurements, albeit a func¬ 
tion that incorporates a particular linear optimization 
problem in its definition. 


Appendix E: Experimental methods 

A 20-mW diode laser with a wavelength of 404.7 nm 
produces photon pairs, one horizontally polarized the 
other vertically polarized, via spontaneous parametric 
down-conversion in a 20-mm type-H PPKTP crystal. 
The downconversion crystal is inside a Sagnac loop and 
the pump laser is polarized vertically to ensure it only 
travels counter-clockwise around the loop. Photon pairs 
are separated at a polarizing beamsplitter and coupled 
into two single-mode fibres (SMFs). Vertically-polarized 
photons are detected immediately at detector Dh, herald¬ 
ing the presence of the horizontally-polarized signal pho¬ 
tons which emerge from SMF and pass through a state- 
preparation stage before they are measured. Herald pho¬ 
tons were detected at a rate of 400 kHz. The single pho¬ 
ton detection rate at detectors and depends on 
the measurement settings. In the transmissive and reflec¬ 
tive ports of the Gian-Taylor PBS (GT-PBS) used in the 
measurement, photons were detected at maximum rates 
of 330 kHz and 250 kHz, respectively. Coincident detec¬ 
tion events between herald photons and the transmissive 
and reflective ports of the measurement PBS were up to 
22 kHz and 16 kHz, respectively. 

Signal photons emerge from the fibre and pass through 
a Gian-Taylor PBS which transmits vertically polarised 
light. Polarization controllers in the fibre maximize the 
number of photons which pass through the beamsplitter. 
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A quarter- and half-waveplate set the polarization of the 
signal photons to one of eight states. 

An SMF acts as a spatial mode filter. This filter en¬ 
sures that information about the angles of the state- 
preparation waveplates cannot be encoded in the spatial 
mode of the photons, and that our measurement proce¬ 
dures do not have a response that depends on the spatial 
mode, but only on polarization as intended. The SMF 
induces a fixed polarization rotation, so a set of three 
compensation waveplates are included after the SMF 
to undo this rotation. It follows that the preparation- 
measurement pairs implemented in our experiment are 
in fact a rotated version of the ideal preparation and a 
similarly-rotated version of the ideal measurement. Such 
a fixed rotation, however, does not impact any of our 
analysis. 

Measurements are performed in four bases, set by a 


half- and quarter-waveplate. A second Glan-Taylor PBS 
splits the light, and both output ports are detected. Due 
to differences in the coupling and detection efficiencies in 
each path after the beamsplitter, each measurement con¬ 
sists of two parts. First, the waveplates are aligned such 
that states corresponding to outcome ‘0’ are transmitted 
by the PBS, and the number of heralded photons de¬ 
tected in a two-second window is recorded for each port. 
Second, the waveplate angles are changed in such a way 
as to invert the outcomes, so the detector in the reflected 
port corresponds to outcome ‘0’ and heralded photons are 
detected for another two seconds. The counts are added 
together and the probability for outcome ‘0’ is calculated 
by dividing the number of detections corresponding to 
outcome ‘0’ by the total number of detection events in 
the four-second window. 



